The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Action recognition based on human skeleton structure represents nowadays a prosper research field. This is mainly due to the recent advances in terms of capture technologies and skeleton extraction algorithms. In this context, we observed that 3D skeleton-based actions share several properties with handwritten symbols since they both result from a human performance. We accordingly hypothesize that...
Automatically recognising facial emotions has drawn increasing attention in computer vision. Facial landmark based methods are one of the most widely used approaches to perform this task. However, these approaches do not provide good performance. Thus, researchers usually tend to combine more information such as textural and audio information to increase the recognition rate. In this paper we propose...
This paper presents a sparse representation based image inpainting method using local patch analysis and geometric structure based feature extraction. In local patch analysis, we approximate the target region by weighted average of some local patches which are frequently occurred within a neighborhood. Local patch statistics is applied to find the most relevant neighbors for each target patch. Further...
A genetic programming (GP)-based framework to learn the effective feature representation for image dehazing is proposed in this work. In GP, an individual program is randomly generated and genetically evolved to achieve the desired goal. To make GP estimate haze in an input image, a set of operators and operands is designed, each of which is a primitive of a GP program. Specifically, we provide four...
Human action recognition from videos has wide applicability and receives significant interests. In this work, to better identify spatio-temporal characteristics, we propose a novel 3D extension of Gradient Location and Orientation Histograms, which provides discriminative local features representing not only the gradient orientation, but also their relative locations. We further propose a human action...
Content based indexing is critical to the effective access of the multimedia data. To this end, visual data is often annotated with textual content for bridging the semantic gap. In this paper, we present a method to generate frame level fine grained annotations for a given video clip. Access to the frame level fine grained annotations lead to rich, dense and meaningful semantic associations between...
Road detection from images is a challenging task in computer vision. Previous methods are not robust, because their features and classifiers cannot adapt to different circumstances. To overcome this problem, we propose to apply unsupervised feature learning for road detection. Specifically, we develop an improved encoding function and add a feature selection process to obtain robust and discriminative...
We propose to learn semantic spatio-temporal embeddings for videos to support high-level video analysis. The first step of the proposed embedding employs a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Gated Recurrent Unit encoders for capturing longer-term temporal structure of the CNN features...
Image registration is an important and fundamental problem in computer vision and image processing. Although there are currently a large number of image registration algorithms such as RANSAC and its extensions, image registration under very noisy conditions remains difficult when it cannot obtain enough number of correct corresponding points. This paper solves this issue by introducing a random resample...
In this paper, we propose a new local descriptor for action recognition in depth images. The proposed descriptor relies on surface normals in 4D space of depth, time, spatial coordinates and higher-order partial derivatives of depth values along spatial coordinates. In order to classify actions, we follow the traditional Bag-of-words (BoW) approach, and propose two encoding methods termed Multi-Scale...
With the success of deep learning in the last few years, the object detection community shifted from processing on exhaustive sliding windows to smaller set of object proposals using more powerful and deep visual representations. Object proposals increase the accuracy and speed up detection process by reducing the search space. In this paper we propose a novel idea of filtering irrelevant edges using...
A novel similarity-covariant feature detector that extracts points whose neighborhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that...
We construct a robust and precise multi-orientation text detection system in scene images which can extensively locate possible characters with multi-information fusion. In our method, an adaptive multi-channel character grouping algorithm is first proposed to extract all possible character candidates robustly, and an AdaBoost classifier is then to properly identify character candidates as characters...
We propose a machine learning based approach to real-time detection and classification assistance for images from unknown environments. While systems for detecting and classifying regular structures like faces in still images are well established, the task of e. g. detecting new morphotypes/objects in an environment is much more complex. The morphotypes/objects are not guaranteed to have apriori known...
The performance of an object detection system relies heavily on two components: an object model to capture the compositional relationship among the object body and its parts, and a feature representation to describe object appearance. In this work, we present an empirical study of combining two state-of-the-art such components: Deformable Part Model (DPM), a proven effective and flexible part-based...
This paper presents a method for detecting a pedestrian by leveraging multi-spectral image pairs. Our approach is based on the observation that a multi-spectral image, especially far-infrared (FIR) image, enables us to overcome inherent limitations for pedestrian detection under challenging circumstances, such as even dark environments. For that task, multi-spectral color-FIR image pairs are used...
In this paper, we discuss a novel approach to incrementally construct a rule ensemble. The approach constructs an ensemble from a dynamically generated set of rule classifiers. Each classifier in this set is trained by using a different class ordering. We investigate criteria including accuracy, ensemble size, and the role of starting point in the search. Fusion is done by averaging. Using 22 data...
Current practices of assessing infants' pain depends on the observer's subjective and potentially inconsistent judgment and requires continuous monitoring by care providers. Therefore, pain may be misinterpreted or totally missed leading to misdiagnosis and over/under treatment. To address these shortcomings, current practices can be augmented with a machine-based assessment system that monitors various...
The Liquid State Machine (LSM) is a biologically plausible model of computation for recurrent spiking neural networks, which offers promising solutions to real-world applications in both software and hardware based systems. At the same time, deep feedforward rate-based neural networks such as convolutional neural networks (CNNs) have achieved great success in many computer vision related applications...
In driving support systems, it is not only necessary to detect the position of pedestrians, but also to estimate the distance between a pedestrian and the vehicle. In general approaches using monocular cameras, the upper and lower positions of each pedestrian are detected using a bounding box obtained from a pedestrian detection technique. The distance between the pedestrian and the vehicle is then...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.